combined_ftest_5x2cv: 5x2cv combined F test for classifier comparisons

https://rasbt.github.io/mlxtend/user_guide/evaluate/combined_ftest_5x2cv/

as a more robust alternative to Dietterich's 5x2cv paired t-test procedure

（（TODO）paired_ttest_5x2cv: 5x2cv paired t test for classifier comparisons）

To explain how this method works, let's consider to estimator (e.g., classifiers) A and B. Further, we have a labeled dataset D.

分類器AとB、ラベル付きデータセットD

In the common hold-out method, we typically split the dataset into 2 parts: a training and a test set.

In the 5x2cv paired t test, we repeat the splitting (50% training and 50% test data) 5 times.

「50%訓練データ、50%テストデータの2分割を5回繰り返す」

（💡 分割の仕方は（TODO：写経）Nested CV for algorithm selection in scikit-learnとも異なる）

In each of the 5 iterations, we fit A and B to the training split and evaluate their performance (pA and pB) on the test split.

「5回の繰り返しのそれぞれで、分類器AとBを訓練分割で訓練し、テスト分割で性能(pAとpB)を評価する

Then, we rotate the training and test sets (the training set becomes the test set and vice versa) compute the performance again, which results in 2 performance difference measures: p(1)=p(1)A−p(1)B and p(2)=p(2)A−p(2)B.

「それから、訓練セットとテストセットを入れ替え、再度性能を計算する」

「その結果、2つの性能差が計測される結果となる」

Then, we estimate mean and variance of the differences:

「平均p￣と平均を使って分散s**2が計算される」（繰り返しのそれぞれで）

The F-statistic proposed by Alpaydin (see paper for justifications) is then computed as (式略) which is approximately F distributed with 10 and 5 degress of freedom.

irisデータセット（小さいデータセット）を例に、ロジスティック回帰と決定木の分類器を比べる

帰無仮説は棄却されない

決定木を浅く単純に変更する

帰無仮説は棄却される

While it is generally not recommended to apply statistical tests multiple times without correction for multiple hypothesis testing,

「複数仮説検定向けの修正無しで、統計的検定を複数回適用するのは一般に推奨されない」

（ここでは初期状態として、ロジスティック回帰と単純な決定木を比較するケースを示していると理解した）

この例はテストコードに使われている

https://github.com/rasbt/mlxtend/blob/v0.21.0/mlxtend/evaluate/tests/test_combined_ftest_5x2cv.py#L17

combined_ftest_5x2cvには訓練済みの分類器を渡してよい？

fitしている https://github.com/rasbt/mlxtend/blob/v0.21.0/mlxtend/evaluate/f_test.py#L188-L189

fitを呼び出すたびにリセットされる（ニューラルネットワークとは異なる）

partial_fit

実装を見ると計算式が具体的に分かる https://github.com/rasbt/mlxtend/blob/v0.21.0/mlxtend/evaluate/f_test.py#L183-L210